Spatial orientations of visual word pairs to improve Bag-of-Visual-Words model
نویسندگان
چکیده
This paper presents a novel approach to incorporate spatial information in the bag-of-visual-words (BoVW) model [1, 3] for category level and scene classification. In the traditional BoVW model, feature vectors are histograms of visual words. This representation is appearance based and does not contain any information regarding the arrangement of the visual words in the 2D image space. In this framework, we present a simple and efficient way to infuse spatial information. Particularly, we are interested in explicit global relationships among the spatial positions of visual words. For that we first introduce the notion of Pair of Identical visual Words (PIW) defined as the set of all the pairs of visual words of the same type. Then a spatial distribution of words is represented as a histogram of orientations of the segments formed by PIW. Figure 1 shows an example which gives an intuition to better understand our approach. Our method eliminates a number of drawbacks from the previous approaches [2, 3] by i) proposing a simpler word selection technique that supports fast exhaustive spatial information extraction, ii) enabling infusion of global spatial information, iii) being robust to geometric transformations like translation and scaling. In the conventional BoVW model, each image is represented by a set of local descriptors {d1 . . .dn} extracted from n patches around interest points or regular grids. A visual vocabulary W = {w1,w2,w3,w4 . . .wN} is obtained by clustering a set of descriptors from all the training images. Here, N is a predefined number and the size of the vocabulary. Each patch of the image is then mapped to the nearest visual word according to the following equation:
منابع مشابه
A Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features
Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...
متن کاملA Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features
Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...
متن کاملCrowded Pedestrian Detection and Density Estimation by Visual Words Analysis
Crowded pedestrian detection and density estimation are very useful and important under transportation environment. In this paper, we present a novel method for crowded pedestrian detection and density estimation through a weighting scheme of bag of visual words model which characterizes both the weight and the relative spatial arrangement aspects of visual words in depicting an image. Firstly,...
متن کاملSpatial Weighting for Bag-of-Visual-Words and Its Application in Content-Based Image Retrieval
It is a challenging and important task to retrieve images from a large and highly varied image data set based on their visual contents. Problems like how to fill the semantic gap between image features and the user have attracted a lot of attention from the research community. Recently, the 'bag of visual words' approach exhibits very good performance in content-based image retrieval (CBIR). Ho...
متن کاملEgocentric Activity Recognition Using Bag of Visual Words
This paper presents an approach for recognizing activities using video from the egocentric setup. In this approach instead of using intermediate setup like object detection, pose estimation, modeling spatial distribution of visual words is implemented. The interactions are encoded by using Histogram oriented Pairwise Relation named (HOPR) between the visual words, orientations and alignments. A...
متن کامل